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Abstract 

A newly developed statistical pair potential based on Distance-scaled Fi- 
nite Ideal-gas REference (DFIRE) state is applied to unbound protein-protein 
docking structure selections. The performance of the DFIRE energy func- 
tion is compared to those of the well-established ZDOCK energy scores and 
RosettaDock energy function using the comprehensive decoy sets generated 
by ZDOCK and RosettaDock. Despite significant difference in the functional 
forms and complexities of the three energy scores, the differences in overall 
performance for docking structure selections are small between DFIRE and 
ZDOCK2.3 and between DFIRE and RosettaDock. This result is remarkable 
considering that a single-term DFIRE energy function was originally designed 
for monomer proteins while multiple-term energy functions of ZDOCK and 
RosettaDock were specifically optimized for docking. This provides hope that 
the accuracy of the existing energy functions for docking can be improved. 

Keywords: potential of mean force, knowledge-based potential, energy score func- 
tions, reference state, binding affinity, and docking decoys. 



INTRODUCTION 

Docking prediction refers to the prediction of the structure of a protein-protein 
complex from the structures of individual subunits. This is a challenging task be- 
cause an unbound subunit often changes its conformation upon binding with its 
partner (induced fit). Docking prediction involves decoy generation and the selec- 
tion of the near-native structure from decoys using a filter and/or energy function. 
Thus, the success of docking prediction requires an efficient method that samples 
near-native conformations and an accurate energy function that ranks the near- 
native conformations as low energy conformations. Advances in sampling methods 
and energy functions for docking have been highlighted in several recent reviews 
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Various energy functions have been used in docking prediction to separate near- 
native structures from other structures. They are classified into two groups: "inte- 
grated" and "edge" functions based on whether or not they were used directly in 
sampling procedures or applied at the end of sampling procedures [21] . Energy func- 
tions are also classified based on the methods used to obtain them. Physical-based 
energy functions [3J EH1 EH El], derived based on the laws of physics, have been 
applied to docking [e.g. DARWIN gS], DOT [I3j, Hex gl], Guided Docking |E|, 
TSCF [2Z], SmoothDock [3]]. Some docking algorithms use semi-empirical energy 
functions that combine various physical terms such as surface complementarity, van 
der Waals interaction, generalized Born-surface area (GB/SA), and hydrogen bond- 
ing with optimized weight factors. Examples are Dock [SUl El EH] ICM-DISCO 
dU, PPD [371 EH], GRAMM HE], FTDOCK |E|, 3D-DOCK [23], AutoDock J3J, 
Surfdock H2|, GAPDOCK [T7], MolFit [221 EE], BIGGER J3J, Northwestern DOCK 
[30J, ZDOCK |H] and RosettaDock [20] • Still others use statistical energy func- 
tions derived from known protein structures JH3J 1121 EH EH UH1 EH EH]- The use 
of energy functions is often accompanied with clusterization to incorporate entropic 
contribution as demonstrated in recent CAPRI (Critical Assessment of PRedicted 
Interaction) [13 E01 El 

Recently, a residue-specific all-atom, distance-dependent potential of mean- 
force was extracted from the structures of single-chain proteins by using a physical 
state of uniformly distributed points in finite spheres [distance-scaled, finite, ideal- 
gas reference (DFIRE) state] as the zero- interaction reference state [S3]. The new 
energy function is shown to be one of the best energy functions in selecting native 
structures from decoys [HE], predicting mutation-induced change in stability [55J and 
loop conformations [HHJ E2] , and reproducing the partitioning of hydrophobic and 
hydrophilic residues within a single protein [Hlj. More importantly, the physical 
reference state of ideal gases appears to make the DFIRE energy function physi- 
cally more accurate because its performance is largely independent of the structural 
database (a or f3 proteins) used for energy extraction [51] . Moreover, an initial 
application of the DFIRE-based "monomer" potential (i.e. the potential extracted 
from the structures of single-chain proteins) to protein-protein binding |29j suggests 
that the monomer potential is likely to be useful for docking prediction because it 



yields a high success rate for native structure selection in docking decoy sets, dis- 
criminates true dimer from crystal interfaces, and provides an accurate prediction 
of protein-protein binding free energies. 

In this paper, we further assess the ability of the "monomer" DFIRE energy 
function to select near-native structures using a large benchmark of unbound dock- 
ing decoy sets [Zj. They are the RosettaDock unbound docking decoy set |20J, 
ZDOCK1.3 0, ZDOCK2.1 |9 : , and ZDOCK2.3 © docking decoy sets. Each dock- 
ing set contains about 50 protein-protein complexes. We show that the unmodified 
version of the DFIRE energy function achieves a success rate in ranking near-native 
structures that is comparable to the success rates given by both ZDOCK and Roset- 
taDock score functions. The implication of this result is discussed. 

RESULTS 

RosettaDock Unbound Docking Decoy Set 

The DFIRE energy function is tested in the RosettaDock unbound docking 
decoy set. As in Ref. [20), the selection capability of a score function is characterized 
by the number of structures within the five lowest energy structures whose root mean 
squared deviation (rmsd) values are less than 10A from the native complex structure 
(n rms d) or whose fractions of native residue-residue contacts are greater than 25% 
(^contact)- Gray et al. further defined that a discrimination is successful (a docking 
funnel is detected) if n rm sd (or n conta ct) is greater than or equal to 3. Table [T] compares 
the performance of the DFIRE energy function with that of RosettaDock on the 
docking decoys of 54 complexes. It shows that the success rate based on n rmsd > 3 
is 32/54 for DFIRE and 34/54 for RosettaDock, respectively. Similar success rates 
are obtained if the criterion n conta ct > 3 is used. The overall performance of DFIRE 
continues to be comparable to RosettaDock when 38 complexes used by RosettaDock 
for parameter optimization are removed. Comparable performance between the 
two methods is also observed when dividing the complexes into enzyme/inhibitor, 
antibody/antigen, and other complexes. This suggests that the finding is robust. 
Figure ^ shows several examples in which the DFIRE energy function produces a 
"funnel" -like shape by plotting its energy score as a function of rmsd from native 
complex structures. 

ZDOCK docking decoy sets 

The DFIRE energy function is applied to docking decoy sets generated by 
different versions of ZDOCK. These unbound docking decoy sets contain 48 protein- 
protein complexes. In ZDOCK, the success rate is defined by number of first near- 
native structures detected within a given number of energy-ranked structures in 
the 48 complexes (see methods). Figure El compares the success rates as a func- 
tion of number of energy-ranked structures (or number of predictions N p ) given by 
DFIRE, ZDOCK1.3, ZDOCK2.0, and ZDOCK2.3. The results are reported for 16 
antibody/antigen complexes, 22 enzyme/inhibitor complexes, 10 other complexes 
and all 48 complexes. For antibody/antigen complexes, the DFIRE energy func- 
tion gives a better success rate than all three versions of ZDOCK except that at 



certain intermediate number of predictions (around 10), the DFIRE energy func- 
tion gives essentially the same success rate as ZDOCK 1.3 and ZDOCK 2.3. For 
enzyme/inhibitor complexes, the performance of the DFIRE energy function con- 
tinues to be better than that of ZDOCK 2.1 but is only better than that of ZDOCK 
1.3 or ZDOCK 2.3 at small and large N p . For other complexes, the success rates 
based on top 1 ranking or top 1000 ranking are essentially the same for all four score 
functions. At other N p values, the performance of DFIRE is essentially the same 
as that of ZDOCK 2.3, better than that of ZDOCK 1.3, and mixed as compared to 
ZDOCK 2.1. For all 48 complexes, the success rate of DFIRE is significantly higher 
(10% or more) than that of ZDOCK 2.1, higher than that of ZDOCK 1.3 for N p < 5 
or N p > 30 and than that of ZDOCK 2.3 for N p < 4 or N p > 30. The difference 
between the results of DFIRE and those of ZDOCK 2.3, however, is small. 

TableElpresents the best rank of near-native structures given by different meth- 
ods. In all three decoy sets, DFIRE increases the ranks of near-native structures 
for more complexes than decreases them from the ranks given by different versions 
of ZDOCK. More specifically, the ranks given by DFIRE are higher for 23 protein 
complexes and lower for 12 protein complexes than those given by ZDOCK 1.3. The 
corresponding numbers are 27 higher and 9 lower, relative to ZDOCK 2.1 and 20 
higher and 18 lower, relative to ZDOCK 2.3. 

Another method to compare different energy score functions is to compare the 
number of near-native structures (or number of hits) that are included within a given 
number of lowest energy structures (number of predictions, N p ). Table HU compares 
the number of near-native structures within the top- 1000 decoys given by different 
methods in three decoy sets. The application of DFIRE energy function leads to 
more protein complexes having a greater number of near-native structures within 
the top 1000 decoys. For example, the numbers of near-native structures given 
by DFIRE are higher for 22 protein complexes and lower for 12 protein complexes 
than those given by ZDOCK 1.3. The corresponding numbers are 29 higher and 
7 lower, relative to ZDOCK 2.1 and 20 higher and 18 lower, relative to ZDOCK 
2.3. The average number of near-native structures per protein complex given by 
DFIRE is higher than ZDOCK 2.1 but is lower than ZDOCK 1.3 and 2.3. We 
found that this is mainly caused by relative higher penalty for hard core overlaps 
in the DFIRE energy function. If a softer DFIRE energy function (see methods) is 
used, the DFIRE energy function will have a higher average near-native structures 
per protein complexes than that given by three versions of ZDOCK. The softer 
DFIRE energy function also further increases the number protein complexes having 
a greater number of near-native structures within the top 1000 decoys than those 
given by either ZDOCK 1.3, 2.1 or 2.3. We also applied softer DFIRE energy 
function to RosettaDock decoy set, but did not find similar results. This result 
indicates that the ZDOCK decoy sets contain significant van der Waals overlaps 
whereas the RosettaDock decoy set has removed those overlaps via minimization. 

DISCUSSION 

In this paper, we have compared the performance of DFIRE, RosettaDock, 



and three versions of ZDOCK in selection of near-native structure from unbound- 
proteins docking decoy. The three energy functions were designed very differently. 
ZDOCK energy functions were optimally designed for docking. The shape comple- 
mentarity was an important component in ZDOCK. The energy score in ZDOCK 1.3 
has three terms: grid-based shape complementarity, desolvation, and electrostatics. 
The energy score in ZDOCK 2.1 uses a pairwise shape complementarity. In ZDOCK 
2.3, the pairwise shape complementarity is further combined with desolvation and 
electrostatics. The RosettaDock energy function, on the other hand, attempts to 
include many physical interactions via physical, empirical, and/or knowledge-based 
approaches. The energy function contains 11 terms that include van der Waals (at- 
tractive and repulsive) interactions, implicit solvation, surface-area solvation, hydro- 
gen bonding, rotamer probability, residue-residue pair probability, and electrostatic 
interactions (short and long-range attractive and repulsive components). In both 
ZDOCK and RosettaDock, weight parameters for different terms were optimized for 
best performance. In contrast, the DFIRE energy function only has one distance- 
dependent pair potential term that contains no adjustable parameters (except the 
energy value for van der Waals core overlaps) . Despite significant difference in three 
energy functions, the performance of the DFIRE energy function is comparable to 
those of either RosettaDock or ZDOCK 2.3 based on the decoys generated by them. 
This is remarkable considering the fact that the DFIRE energy function was orig- 
inally designed for monomer proteins. It remains to be seen if the performance of 
DFIRE can be further improved if the DFIRE energy function is used directly in 
sampling and minimization (work in progress). 

The result that a single term of statistical pair potential has a performance 
similar to multiple-term energy functions provides new hope for going beyond the 
existing accuracy of energy functions for docking. This is because some physical in- 
teractions were not taken into account by the DFIRE energy function. One obvious 
example is the multibody hydrogen bonding interaction. Thus, it is possible that 
incorporating some terms used in the RosettaDock energy function or the ZDOCK 
energy function may further improve the accuracy of the DFIRE energy function. On 
the other hand, the matching performance among three very different energy func- 
tions may signal that a bottleneck in the accuracy of energy function has reached. 
One possible source of the error in all three energy functions is implicit solvation. If 
this is true, combining additional terms such as hydrogen bonding and/or surface- 
accessible solvation with DFIRE will unlikely make a significant improvement in the 
accuracy of docking prediction. Work is in progress to determine which scenario is 
true. 

It should be noted that the DFIRE energy function is one of the best energy 
functions for predicting the protein-protein (peptide) binding free energy. Using a 
combined database of 28 binding free energies collected by Gray et al. (2003b) and 69 
binding free energies J2H] , the correlation coefficient and the rmsd between measured 
binding free energies and that predicted by DFIRE is 0.79 and 2.35 kcal/mole, 
respectively (See Figure EJ). This suggests that an accurate prediction of binding 
free energy does not guarantee an accurate docking prediction. This further suggests 



that the interaction energy missed in the DFIRE energy function only makes a 
small contribution to the binding free energy of the native complex structure but 
significantly destabilizes other alternative conformations. This highlights one of the 
biggest weaknesses of statistical potentials: they are trained by native structures 
only. 

METHODS 

DFIRE-based Potential and Soft DFIRE potential 

The derivation of equations, the method for extracting the DFIRE-based poten- 
tial using a structure database as well as the resulting potential have been described 
or obtained previously (HHI- Here, we give a brief summary for completeness. 

The atom-atom potential of mean force u(i,j,r) between atom types i and j 
that are distance r apart is given by 



{—nPT In N obs (i,j,r) 
^ Jln (^(^W^M' - TcuU (1) 

0, r > rent, 

where r] = 0.0157, R is the gas constant, T = 300K, a = 1.61, N f, s (i,j,r) is 
the number of (i,j) pairs within the distance shell r observed in a given structure 
database, r cut = 14. 5A, and Ar(Ar cut ) is the bin width at r{r cut ). (Ar = 2 A, for 
r < 2A; Ar = 0.5A for 2A< r <8A; Ar = lA for 8A< r <15A.) The prefactor rj 
was determined so that the regression slope between the predicted and experimen- 
tally measured changes of stability due to mutation (895 data points) is equal to 
1.0. The exponent a for the distance dependence was obtained from the distance 
dependence for the number of pairs of ideal gas points in finite spheres (finite ideal- 
gas reference state). Residue specific atomic types were used (167 atomic types) 
|4*3*1 132*] . The number of observed atomic (i,j) pair within the distance shell r 
[Nobsih 3i r )] was obtained from a structural database of 1011 non-homologous (less 
than 30% homology) proteins with resolution < 2A , which was collected by Hobohm 
et al. (1992) http://chaos.fccc.edu/research/labs/dunbrack/culledpdb.html ). This 
database provides sufficient statistics for most distance bins (except near the repul- 
sive van der Waals regions) . The average number of observed atomic pairs per bin 
is 655. The sufficiency of statistics is also reflected from the fact that the results 
for structural discrimination are insensitive to the size of structural database J53] or 
the type of structural database [HI] used to generating the potential . The potential 
u(i, j, r) is set to lOry if N obs (i, j, r) = 0. For a soft-DFIRE energy function, the value 
is set to 2rj. 

Binding Free Energy and Structure Selections from Docking Decoys 

The total atom-atom potential of mean force, G, for each structure is given by 

where the summation is over atomic pairs that are not in the same residue and a 
factor of 1/2 is used to avoid double counting of residue-residue and atom-atom 



interactions. The binding free energy of a dimer AB is obtained as follows: 

AGbind = Ccomplox — \G A + G b) ■ (3) 

Since the structures of monomers are approximated as rigid bodies and the residues 
at the interface contribute most to AGbind, Eq. © can be further simplified to 

i interface 

AGbind = 2 J2 u(i,j,r i: j), (4) 

where the summation is over any two atoms belong to an "interacting" residue 
pair from different chains at the interface. We follow the definition, due to Lu et 
al. (2003), in which an interacting residue pair is a pair of residues from different 
chains that have at least one pair of heavy atoms within 4.5A of each other. The 
binding free energy AG b ^ y is calculated for each docking decoy and the ranking is 
based on the value of calculated binding free energy. 

Unbound Docking Decoy Sets 

The first decoy set (RosettaDock set) consists of 54 decoy sets [version 1.0 of 
Chen-Mintseris-Janin-Weng's benchmark j7]] downloaded from the website http://gr 
aylab.jhu.edu/docking/decoys/. The decoy sets are generated by random starting 
position of unbound monomer components superimposed on the native bound com- 
plex structure, followed by RosettaDock protocol to create a diffuse space distri- 
bution that covers a reasonable area ( 20 A radius rmsd) with moderate density 
around the native position. Each decoy set has 1000 decoys per protein complex 
[For more detailed description, see Gray et al. (2003a).] 

The second decoy sets (ZDOCK decoy sets) consist of 48 protein-protein com- 
plexes [version 0.0 of the benchmark [7|] downloaded from the website http://zlab.bu, 
edu/~rong/dock/software.shtml. The decoy sets are generated using fast Fourier 
transform (FFT) algorithm based on three different scoring function developed. 
They are ZDOCK1.3 that combines grid-based shape complementarity, GSC, with 
desolvation and electrostatics (GSC+DE+ELEC) 9J, ZDOCK2.1 with pairwise 
shape complementarity (PSC) [TU] and ZDOCK2.3, with combined PSC, desolvation 
and electrostatics (PSC+DE+ELEC) jHj. That is, we have three different sub-decoy 
sets and each sub-decoy set has 2000 decoys per protein complex. 

Performance Evaluation 

In RosettaDock unbound decoy set, the rmsd between decoy and native struc- 
ture is calculated over the C a atoms of the smaller docking partner (ligand) in the 
fixed coordinate frame of the larger partner (receptor). The native residue- residue 
contact fraction is calculated as the fraction of the contacts (residue pairs with at 
least one inter-residue heavy atom pairs < 4A ) identified in the native structure 
that are also present in the decoy structures. The performance of scoring function 
is evaluated by the number of energy funnels formed. The unbound perturbation 
funnels are quantified by examining the five lowest DFIRE energy decoys. If at least 
three of these structures either have less than 10 A rmsd from the native structure 
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or a native residue-residue contact fraction above 25%, a successful energy funnel 
exists for this target. [For more detailed description of the above criterion, see Gray 
et al. (2003a).] 

In ZDOCK's docking decoys, the rmsd between decoy and native structure is 
calculated over the C a atoms of interface residues, which are residue pairs between 
receptor and ligand with at least one inter-residue heavy atom pairs < 10A . A hit 
(near-native structure) is defined as decoy with rmsd < 2.5A . The performance 
of a scoring function is evaluated by using success rate and hit count, as defined 
by Rong and Weng (2003). Success rate is defined as the percentage of test cases 
in the 48 targets sets for which at least one hit has been found within a given 
number of lowest-energy structures (predictions) for each test case (N p ). Hit count 
is the average number of hits (near-native structures) per target within a given N p . 
Success rate only relies on the first best rank of hit in each protein-protein complex 
decoy set. Hit count characterizes the ability to retain near-native structures for 
post-processing within a given number of allowed candidates. 
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Table 1: Comparison of Performance in RosettaDock unbound perturbation of 54 
complexes . 



PDB ID 


1ACB 6 


1AVW 6 


1BRC"< C 


1BRS"< C 


1CGP 


1CH0 6 


1CSE" 


1DFJ" 


RoscttaDock d 
DFIRE^ 


2/l e 
4/49 


5/5 
5/5 


1/2 

2/2 


4/4 

5/5 


4/4 
3/3 


3/3 

5/5 


2/0 

1/2 


4/4 
3/3 


PDB ID 


1FSS" 


1MAH 6 < C 


1TGS"> C 


1UGH" 


2KAP> C 


2PTC" 


2SIC b 


2SNI" 


RosettaDock d 
DFIRE^ 


5/5 
5/0 


5/5 
5/5 


5/5 
5/5 


5/4 
5/3 


4/4 
4/4 


2/2 
0/1 


5/5 
5/5 


4/4 

5/5 


PDB ID 


lPPE bc 


1STF 6, C 


1TAB 6C 


lUDI bc 


2TEC 6 < C 


4HTC bc 


lAHW' 1 


1BVK" 


RosettaDock' 1 
DFIRE^ 


5/5 
5/5 


5/5 
5/5 


5/5 
3/3 


5/5 
5/5 


5/5 
4/5 


5/5 
5/5 


5/5 
2/2 


5/0 

5/1 


PDB ID 


1DQJ' 1 


1MLC 1 


lWEJ ft 


lBQL h 


lE08 ft 


1FBI" 


1IAP' C 


1JHL' 1 


RosettaDock" 
DFIRE^ 


2/2 
1/1 


0/0 
0/0 


0/2 
3/1 


5/5 
1/1 


1/4 
0/0 


3/3 

2/3 


0/1 

2/2 


1/0 
1/1 


PDB ID 


1MEL"> C 


1NCA" 


1NMB" C 


1QFU" 


2JEL' 1 


2VIR hc 


1AVZ 1 


1MDA 4 


RosettaDock d 
DFIRE^ 


5/5 
3/4 


5/5 
3/3 


5/5 
0/0 


5/5 

4/4 


5/4 
5/5 


4/1 
3/3 


0/0 
1/0 


3/0 

2/1 


PDB ID 


1WQP 


2PCC 4 


1A0O 1 


1ATN' 


1GLA 4 


1IGC 4 


1SPB 1 


2BTF* 


RosettaDock d 
DFIRE-f 


3/4 
4/4 


3/1 
1/3 


1/4 
3/1 


5/5 
5/5 


1/1 
0/0 


2/2 
0/0 


5/5 
5/5 


4/4 
5/4 


PDB ID 


lBTff c 


1FIN-? 


1FQP 


IGO'P 7 


1EFIP 


3HHRJ 


%Total fc 


%Subset' 


RosettaDock d 
DFIRE-f 


0/1 
0/0 


0/0 
0/0 


2/2 
3/5 


0/0 
0/0 


0/0 
0/0 


0/0 
1/0 


34/32 
32/30 


13/12 
12/12 



a Boldcd targets are decoys from docking between unbound and bound structures 7 . Others arc between unbound and unbound 
structures. The enzyme/inhibitor complexes. c The complexes that were not used for optimizing the weighting scores in the 

RosettaDock energy function. The High-resolution RosettaDock scoring function 19 20| . e The first (second) number in the cell 
is the number of top 5 decoys with rmsd<10A (more than 25% of native residue- residue contact) given by the RosettaDock scoring 
function. * The DFIR.E-bascd potential derived from a database of single-chain proteins ^^. 9 The first (second) number in the cell 
is the number of top 5 decoys with rmsd<10A (more than 25% of native residue-residue contact) given by DFIRE scoring function. 
The antibody/antigen complex. l Other complexes. J Difficult targets. The success rate based on the number of targets that 
have greater than or equal to three rmsd<10A (or more than 25% native contact decoys) ranked in top 5 as in Ref. |20J. The 
success rates of the independent subset for the complexes that were not used in weight optimization. 
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Table 2: 


Highest rank of hits 


in ZDOCK docking 


decoy" 


PDB ID 


ZDOCK1.3 DECOYS 6 
GDE e DFIRE-f 


ZDOCK2.1 DECOYS c 
PSC 9 DFIRE-f 


ZDOCK2.3 DECOYS d 
PDE h DFIRE^ 



1MLC 


134 


64 


1WEJ 


1940 


159 


1AHW 


11 


10 


1DQJ 


- 


- 


1BVK 


- 


- 


1FBP 


561 


812 


2JEL* 


- 


- 


1BQL ! 


1 


21 


1JHL ! 


- 


- 


1NCA 1 


211 


90 


1NMB 1 


1108 


329 


1MEL 1 


9 


1 


2VIR 1 


- 


- 


1E08* 


- 


- 


1QFU 1 


606 


48 


1IAP 


905 


102 


1CGI 


3 


4 


1CHO 


22 


1 


2PTC 


65 


11 


1TGS 


5 


1 


2SNI 


169 


331 


2SIC 


2 


126 


1CSE 


3 


30 


2KAI 


1772 


1044 


1BRC 


52 


25 


1ACB 


3 


25 


1BRS 


1019 


211 


1MAH 


9 


12 


1UGH 


11 


30 


1DFJ 


2 


18 


1FSS 


1066 


113 


1AVW 


704 


37 


1PPE 1 


1 


1 


1TAB 1 


- 


- 


1UDP 


198 


33 


1STF 8 


1 


1 


2TEC 1 


1 


2 


4HTC 1 


2 


2 


2PCC 


702 


234 


1WQ1 


131 


82 


1AVZ 


- 


- 


1MDA 


- 


- 


1IGC 8 


- 


- 


1ATN J 


13 


2 


1GLA 1 


214 


53 


1SPB S 


1 


2 


2BTF 8 


27 


1 


1A0O 1 


619 


108 


Ratio J 


- 


12/23 


Top l k 


4 


6 



1396 


141 


1106 


406 


26 


5 


1341 


312 


974 


1386 


1786 


1619 


112 


214 


172 


28 


101 


116 


2 


11 


693 


215 


12 


1 


176 


125 



107 



4 


7 


1 


1 


1655 


715 


3 


4 


241 


13 


1537 


37 


1399 


261 


173 


13 


25 


33 


61 


11 


819 


39 


305 


316 


37 


11 


731 


259 


45 


16 


1 


1 


65 


6 


31 


1 


1 


1 


1 


1 


1 


1 



22 
360 


239 

127 


1 

32 

833 


5 
1 
139 


6 


9/27 
9 



128 


146 


183 


36 


7 


4 


821 


1239 


642 


1418 


233 


1030 


13 


31 


333 


51 


1 


152 


135 


28 


3 


1 


1101 


315 


1497 


111 


388 


1 


997 


299 


1 


78 


3 


1 


193 


15 


3 


9 


1262 


913 


11 


95 


198 


9 


388 


212 


21 


13 


18 


70 


65 


131 


21 


11 


8 


22 


1 


26 


50 


97 


3 


16 


1 


1 


79 


5 


5 


1 


1 


1 


1 


1 


3 


3 



15 



153 
7 


152 
7 


1 
2 

284 


13 
1 

218 


6 


18/20 

7 



a The 1JTG decoy set is not available (See http://zlab.bu. edu/^rong/dock/softwarc.shtml i. Hits are defined as docked structures 
with interface rmsd<2.5A from the crystal complexes. There are 2000 decoys for each target. Decoys generated by ZDOCK1.3 |^J. 
c Decoys generated by ZDOCK2.1 Q5|. d Decoys generated by ZDOCK2.3 Q|. e ZDOCK1.3(GDE) |g|. -^The DFIRE-based potential 
1551 . 9 ZDOCK2.1(PSC) ESI- /l ZDOCK2.3(PDE) |B|. * Decoys from docking between unbound and bound structures. j The first 
(second) number is the number of targets whose ranks given by DFIRE arc lower (higher) than that given by the ZDOCK scoring 
function. The number of targets whose near-native structures are scored as top 1. 
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Table 3: Hit scored in top-1000 of ZDOCK docking decoy a 





ZDOCK1.3(GDE) 


DECOYS" 


ZDOCK2.1(PSC) 


DECOYS c 


PDB ID 


GDE e 


DFIRE' 


SoftDFIRES 


PSC' 1 


DFIRE-f 


SoftDFIRE 


1MLC 


11 


11 


14 





3 


3 


1WEJ 





1 








3 


4 


1AHW 


41 


63 


57 


18 


21 


26 


1DQJ 


- 


- 


- 





1 





1BVK 


- 


- 


- 


1 








1FBP 


1 


2 


2 








1 


2JEL-' 


- 


- 


- 


33 


21 


13 


1BQLJ 


73 


68 


85 


6 


12 


11 


1JHLJ 


- 


- 


- 


7 


10 


6 


1NCA-? 


6 


9 


7 


43 


31 


11 


1NMB-* 





3 


2 


3 


5 





1MELJ 


19 


30 


32 


36 


17 


52 


2VHV 


- 


- 


- 


1 


3 





1E08 J ' 


- 


- 


- 


- 


- 


- 


1QFIP 


■1 


1 


4 


2 


10 


10 


HAP 


1 


3 


3 


- 


- 


- 


1CGI 


13 


52 


70 


29 


37 


12 


1CH0 


53 


85 


76 


54 


65 


65 


2PTC 


38 


43 


61 





2 


1 


1TGS 


60 


72 


86 


87 


85 


106 


2SNI 


34 


19 


46 


- 


- 


- 


2SIC 


96 


53 


105 


10 


20 


16 


1CSE 


61 


30 


48 





3 





2KAI 








1 





2 


2 


1BRC 


9 


17 


20 


13 


16 


16 


1ACB 


154 


120 


136 


21 


25 


32 


1BRS 





1 


3 


20 


25 


32 


1MAH 


15 


10 


51 


3 


6 


6 


1UGH 


36 


13 


53 


2 


3 


4 


1DFJ 


36 


9 


13 


13 


10 


15 


1FSS 





2 


2 


1 


4 


4 


1AVW 


2 


2 


2 


18 


21 


27 


1PPEJ 


257 


143 


2.30 


215 


198 


218 


1TAB-? 


- 


- 


- 


31 


40 


29 


1UDP 


28 


29 


32 


13 


13 


16 


1STFJ 


140 


120 


113 


37 


39 


12 


2TEC' 


191 


138 


168 


64 


69 


57 


4HTC J 


53 


54 


65 


40 


-17 


52 


?W& 


§ 


§ 


lo 


21 


17 


22 


1AVZ 


- 


- 


- 


- 


- 


- 


1MDA 


- 


- 


- 


- 


- 


- 


1IGC J 


- 


- 


- 


1 


1 


1 


lATrP 


30 


44 


40 


1 


1 





1GLA-7 


8 


19 


19 


- 


- 


- 


1SPBJ 


85 


84 


102 


59 


59 


73 


2BTFJ 


15 


11 


14 


11 


13 


13 


1A0OJ 


4 


9 


6 


1 


2 


2 


Ratio fc 


- 


12/22 


5/30 


- 


7/29 


9/30 


Average' 


34.25 


30.19 


38.40 


19.13 


20.73 


22.71 



ZDOCK2.3(PDE) DECOYS" 
PDE 1 DFIRE 7 SoftDFIRES 



11 


8 


11 


19 


50 


56 


1 





1 





17 





16 


30 


5 


9 


55 


25 


7 


7 


52 


58 





3 





2 


10 


18 


1 


3 


50 


12 


68 


85 


28 


25 


79 


82 





1 


38 


28 


15 


22 


3 


13 


35 


47 


54 


51 


11 


11 


22 


21 


14 


14 


44 


1-1 


11 


10 


39 


34 


325 


2.33 


28 


39 


26 


30 


67 


67 


151 


127 


36 


11 



10 



29 



9 

21 

59 

1 
3 

19 
51 
7 

-17 

2 

71 



1 

18 

2 

53 

79 

33 

105 



30 

8 

11 

49 

54 

16 

26 

20 

15 

13 

49 

296 

31 

35 

77 

109 

52 

16 



2 


2 


1 


16 


23 


2 


98 


71 


105 


28 


33 


30 


4 


3 


2 


- 


18/20 


10/29 


33.38 


30.10 


35.23 



a The 1JTG decoy set is not available (See http://zlab. bu.edu/~rong/ dock/software. shtml i. Hits are defined as docked structures 
with interface rmsd<2.5A from the crystal complex. There are 2000 decoys for each target. Decoys generated by ZDOCK1.3 |9J. 
c Decoys generated by ZDOCK2.1 \W\. d Decoys generated by ZDOCK2.3 [£]. e ZDOCK1.3(GDE) [5|. ^The DFIRE-based potential 
1551 . 9 The Soft-DFIRE potential. h ZDOCK2.1(PSC) IT01 . i ZDOCK2.3(PDE) ©- J Decoys from docking between unbound and 
bound structures. The first (second) number are the number of targets whose number of hits given by DFIRE/SOFT-DFIRE arc 
lower (higher) than that given by the ZDOCK scoring function. The average number of hits over 48 targets. 
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Figure 1: Scatter plots of the DFIRE score versus rmsd of RosettaDock decoy from 
the native structure (based on C a atoms). Results of two proteins (lstf at the top 
left and lnca at the top right) from the enzyme/inhibitor complexes, two proteins 
(lbvk at the middle left and lqfu at the middle right) from the antibody/ antigen 
complexes, and two proteins (lspb at the bottom left and 2btf at the bottom right) 
from the other complexes are shown. 
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Figure 2: The performance of ZDOCK1.3 (left), ZDOCK2.1 (middle), ZDOCK2.3 
(right) are compared to that of DFIRE according to success rates as a function 
of number of predictions (number energy-ranked structures) in 16 antibody-antigen 
decoy sets (top), 22 enzyme-inhibitor decoy sets (middle up) and 10 other complexes 
decoy sets (middle bottom) and 48 overall decoy sets (bottom) 
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Figure 3: The theoretically predicted binding free energy versus experimentally 
measured ones. The line is from linear regression fit with a correlation coefficient of 
0.79, a rmsd of 2.35 kcal/mole. The dashed line indicates the location if there were 
a perfect agreement. 
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